Author’s Roles

Emily Drew Johnson analyzed results, plotted data, fit growth curves and generated the report.

Acknowledgements

DA Campbell wrote code for analyses and plotting.
L Barney conducted experiment. This report is based upon collective data accumulated by the BIOL2201 Class, 2021, Mount Allison University.

Run This Chunk First

Additional libraries needed to run the project.

Introduction

Microbes grow; Mould is (sort of) a microbe. We will import microbial growth data, tidy it, plot growth data vs. time and fit models of microbial growth to the data.

Some models for microbial growth

Each model defines a pattern of growth. Linear: microbial population increases by a constant amount per unit time. Two parameters; intercept (P0) and linear slope Pt = ehours*linearslope + P0

Exponential: microbial population increases by a constant proportion per unit time. Two parameters; intercept (P0) and exponential growth rate constant Pt = P0 (exp(exprateehours))

Exponential rise towards a plateau: the gap between a maximum population and the current microbial population decreases by a constant proportion per unit time. Three parameters; intercept (P0), exponential growth rate constant and plateau Pt = Max - ((Max-Intercept)exp(x-mu)

Logistic: microbial population initially increases exponentially but slows as population approaches a plateau, Pmax. Three parameters; intercept (P0), exponential rate constant and maximum population Pmax. Pt= {Pmax e^µt }/{Pmax + P0 (e^µt- 1)}

linear_eqn_test <- function(x, slope = 0.005, Intercept = 5){x*slope + Intercept
}
ggplot(data.frame(x=c(1, 3000)), aes(x=x)) + 
  stat_function(fun = linear_eqn_test) +
  labs(title = "Linear") +
  theme_bw()

exp_rise_eqn_test <- function(x, mu = 0.005, Intercept = 5){Intercept *exp(x*mu)
}
ggplot(data.frame(x=c(1, 3000)), aes(x=x)) + 
  stat_function(fun = exp_rise_eqn_test) +
  labs(title = "Exponential") +
  theme_bw()

exp_rise_plateau_eqn_test <- function(x, mu = 0.005, Intercept = 0.5, Max = 10000){Max - ((Max-Intercept)*exp(x*-mu))
}
ggplot(data.frame(x=c(1, 3000)), aes(x=x)) + 
  stat_function(fun = exp_rise_plateau_eqn_test) +
  labs(title = "Exponential towards Plateau") +
  theme_bw()

logistic_rise_eqn_test <- function(x, mu = 0.005, Intercept = 5, Max = 10000){(Max*Intercept*exp(mu*x))/(Max + (Intercept*(exp(mu*x)-1)))
}

ggplot(data.frame(x = c(1, 3000)), aes(x=x)) + 
  stat_function(fun = logistic_rise_eqn_test) +
  labs(title = "Logistic") +
  theme_bw()

Materials and Methods

We grew mould on slices of bread under wrapping in the dark (or in the light, or at low and high temperature…).

Set your initials code for your MicroColonyGrowth data.

MyInitials <- c("EmJo")

Import data.

The import function may open a Google log in to get an authorization code in a separate browser window. Copy/paste the authorization code into the ‘Console’ window below. See next chunk for manual alternative if needed; download .csv from GoogleSheet, then upload the .csv to RStudio.cloud.

gs4_deauth()
ColonyData <- read_sheet("https://docs.google.com/spreadsheets/d/1DWyASxmu5DURkcegWM0Sykp9x3F3eQfMxdz0VYJ9mOI/edit?usp=sharing")

ColonyData

Alternative Doug: Manually Download Data from Googlesheets as .csv. Doug: Manually Upload to RStudio.cloud. Read in .csv into ColonyData object.

# ColonyData <- read_csv("MicroColonyData2021.csv", col_names = TRUE)
# ColonyData

Filter the class data to include only ‘MyData’ (defined by ‘MyInitials’ set above). Convert the data to long format for analyses. Convert the YYYYMMDDHHMM data column to a formatted datetime column, Generate a numeric ehours column for elapsed time.

#filter data,long format for analyses
MyData <- ColonyData %>%
  filter(Initials_4letter == MyInitials) %>%
  pivot_longer(cols = colnames(ColonyData[5:7]), names_to = 'Colonies', values_to = "Areamm2") %>%
  mutate(Areamm2 = as.numeric(Areamm2)) %>%
  drop_na()

#convert YYYYMMDDHHMM to datetime format
MyData <- MyData %>%
  group_by(Substrate, Treatment) %>%
  mutate(YYYYMMDDHHMM = ymd_hm(YYYYMMDDHHMM)) %>%
  mutate(ehours = as.numeric(((YYYYMMDDHHMM - min(YYYYMMDDHHMM)))/3600))

Results

Upload representative colony images to your RStudio.cloud. Add representative images of your colony growth over time to your .Rmd. Will work with .JPG and .png; not sure about other graphic formats.

knitr::include_graphics(file.path("202101262155.jpg"))

Example of Mould Colony on Naan, 202101240946.JPG

Plot the data with a separate facet for each colony.

Or, plot the data with all colonies under a common combination of substrate and treatment in a single panel.

#set Y axis limit range
#YLIM = c(0,max(MyData$Areamm2, na.rm = TRUE))

MyData %>%
  ggplot() +
  geom_point(aes(x = ehours, y = Areamm2, colour = Colonies)) +
  facet_grid(cols = vars(Treatment), rows = vars(Substrate)) +
  theme_bw() +
  labs(title = "Colony Growth",
       subtitle = MyInitials)

MyData %>%
  ggplot() +
  geom_point(aes(x = ehours, y = Areamm2, colour = Colonies)) +
  facet_grid(cols = vars(Colonies), rows = vars(Substrate, Treatment)) +
  theme_bw() +
  labs(title = "Colony Growth",
       subtitle = MyInitials)

Colony Size vs. Hours Elapsed since First Measure.

Plot the ln(Areamm2) to check whether growth follows an exponential pattern Default log in R is log base E, not log base 10.

MyData %>%
  ggplot() +
  geom_point(aes(x = ehours, y = log(Areamm2), colour = Colonies)) +
  facet_grid(cols = vars(Colonies), rows = vars(Substrate, Treatment)) +
  theme_bw() +
  labs(title = "Colony Growth",
       subtitle = MyInitials)

My Favourite Caption

Let R find the best fit parameters for models of the data.

Create R function for linear equation.

#define a linear equation as a function.
#x will be taken from 'ehours' when we run the fit.
linear_eqn <- function(x, m, intercept){(x * m) + intercept
}
#define starting, lower, and upper bounds for fit parameters to constrain linear equations
linear_eqn_start <- list(m = 1, intercept = min(MyData$Areamm2, na.rm = TRUE))
linear_eqn_lower <- c(0,0)

linear_eqn_upper <- c(max(MyData$Areamm2, na.rm = TRUE), max(MyData$Areamm2, na.rm = TRUE))

Create R function for exponential equation.

#define an exponential equation as a function.
#x will be taken from 'ehours' when we run the fit.
exp_eqn <- function(x, mu, intercept){intercept * exp(x*mu)
}
#define starting, lower, and upper bounds for fit parameters to constrain linear equations
exp_eqn_start <- list(mu = 0.1, intercept = min(MyData$Areamm2, na.rm = TRUE))
exp_eqn_lower <- c(0,0)

exp_eqn_upper <- c(1, max(MyData$Areamm2, na.rm = TRUE))

Create R function for logistic equation.

#define a logistic equation as a function.
#x will be taken from 'ehours' when we run the fit.
logistic_eqn <- function(x, pmax, mu, intercept){(pmax*intercept*exp(mu*x))/(pmax + (intercept*(exp(mu*x)-1)))
}
#define starting, lower, and upper bounds for fit parameters to constrain logistic fit
logistic_eqn_start<-list(pmax = max(MyData$Areamm2, na.rm = TRUE), mu = 0.05, intercept = min(MyData$Areamm2, na.rm = TRUE))

logistic_eqn_lower<-c((max(MyData$Areamm2, na.rm = TRUE) * 0.5),0.001,((min(MyData$Areamm2, na.rm = TRUE) * 0.1)))

logistic_eqn_upper<-c((max(MyData$Areamm2, na.rm = TRUE) * 2),1,((min(MyData$Areamm2, na.rm = TRUE) * 4)))

Each user measured multiple colonies, possibly on multiple substrates with multiple treatments. We create a ‘nested’ data frame where all data in columns YYYYMMDDHHMM, Areamm2, ehours, from a single user, a single colony, on a single substrate, with a single treatment are ‘nested’ together for later fitting of growth curves.

MyData_nest <- as_tibble(MyData) %>%
  #filter(Treatment != "Hot") %>%
  nest(data = c(YYYYMMDDHHMM, Areamm2, ehours, Temp_C))

Fit and plot treatment colony specific linear growth trajectories using nest purrr:map & broom::augment

This chunk uses complicated code from the ‘Tidyverse’ package. R will iteratively vary the parameters of the fitting equations to minimize the residuals (discrepancies) between the data points (nested) and the points predicted by the model with a given set of parameters.

colony_lin <- MyData_nest %>% 
  mutate(
  fit = map(data, ~nlsLM(Areamm2 ~ linear_eqn(x = ehours, m, intercept), data = .x, start = linear_eqn_start, lower = linear_eqn_lower, upper = linear_eqn_upper)),
  predict = map(fit,augment),
  tidied = map(fit, tidy),
  param = map(fit, glance)
  )

colony_lin %>%
  unnest(predict)  %>% 
  ggplot() +  
  geom_line(aes(x = ehours, y = .fitted), size = 0.5) +
geom_ribbon(aes(x = ehours, ymin = (.fitted - .resid), ymax = (.fitted + .resid), alpha = 0.1),show.legend = FALSE) +
  facet_grid(cols = vars(Colonies)) +
  geom_point(data = MyData, aes(x = ehours, y = Areamm2), colour = "green") +
  theme_bw() +
  labs(y = ".fitted Areamm^2",
       title = "Colony Growth, Linear Fits",
       subtitle = MyInitials,
       caption = "Linear line, residuals grey ribbon")

#parameters of colony specific linear fits
colony_lin %>% 
  unnest(tidied) %>%
  select(-c(data, fit, predict, param)) %>%
  mutate_if(is.numeric, round, digits = 2)

Fit and plot treatment colony specific exponential growth trajectories using nest purrr:map & broom::augment

colony_exp <- MyData_nest %>% 
  mutate(
  fit = map(data, ~nlsLM(Areamm2 ~ exp_eqn(x = ehours, mu, intercept), data = .x, start = exp_eqn_start, lower = exp_eqn_lower, upper = exp_eqn_upper)),
  predict = map(fit,augment),
  tidied = map(fit, tidy),
  param = map(fit, glance)
  )

colony_exp %>%
  unnest(predict)  %>% 
  ggplot() +  
  geom_line(aes(x = ehours, y = .fitted), size = 0.5) +
geom_ribbon(aes(x = ehours, ymin = (.fitted - .resid), ymax = (.fitted + .resid), alpha = 0.1),show.legend = FALSE) +
  facet_grid(rows = vars(Colonies)) +
  geom_point(data = MyData, aes(x = ehours, y = Areamm2), colour = "green") +
  theme_bw() +
  labs(y = ".fitted Areamm^2",
       title = "Colony Growth, Exponential Fits",
       subtitle = MyInitials,
       caption = "Exponential line, residuals grey ribbon")

#parameters of colony specific exponential fits
colony_exp %>% 
  unnest(tidied) %>%
  select(-c(data, fit, predict, param)) %>%
  mutate_if(is.numeric, round, digits = 2)

Fit and plot treatment colony specific logistic growth trajectories using nest purrr:map & broom::augment

colony_log <- MyData_nest  %>% 
  mutate(
  fit = map(data, ~nlsLM(Areamm2 ~ logistic_eqn(x = ehours, pmax, mu, intercept), data = .x, start = logistic_eqn_start, lower = logistic_eqn_lower, upper = logistic_eqn_upper)),
  predict = map(fit,augment),
  tidied = map(fit, tidy),
  param = map(fit, glance)
  )

colony_log %>% 
  unnest(predict) %>%
  ggplot() +  
  geom_line(aes(x = ehours, y = .fitted), size = 0.5) +
geom_ribbon(aes(x = ehours, ymin = (.fitted - .resid), ymax = (.fitted + .resid), alpha = 0.1),show.legend = FALSE) +
  geom_point(data = MyData, aes(x = ehours, y = Areamm2), colour = "green") +
  facet_grid(cols = vars(Colonies)) +
  theme_bw() +
  labs(y = ".fitted Areamm^2",
       title = "Colony Growth, Logistic Fits",
       subtitle = MyInitials,
       caption = "Logistic line, residuals grey ribbon")

#parameters of colony specific logistic fits
colony_log %>%
  unnest(tidied) %>%
  select(-c(data, fit, predict, param)) %>%
  mutate_if(is.numeric, round, digits = 2)

Compare Linear, Exponential, and Logistic Models using ANOVA

Doug to implement if feasible; too slow, too long

Organize all class data

ClassData <- ColonyData %>%
  pivot_longer(cols = colnames(ColonyData[5:7]), names_to = 'Colonies', values_to = "Areamm2") %>%
  filter(Areamm2 != "NULL") %>%
  mutate(Areamm2 = as.numeric(Areamm2)) %>%
  mutate(YYYYMMDDHHMM = ymd_hm(YYYYMMDDHHMM)) %>%
  group_by(Initials_4letter) %>%
  mutate(ehours = as.numeric((YYYYMMDDHHMM - min(YYYYMMDDHHMM))/3600)) %>% 
  filter(ehours < 320) %>%
  drop_na()

Plot class data facet by Substrate

ClassDataPlot <- ClassData %>%
  ggplot() +
  geom_point(aes(x = ehours, y = Areamm2, colour = Initials_4letter)) +
  facet_grid(rows = vars(Substrate), cols = vars(Treatment)) +
  theme_bw() +
  labs(title = "Class Colony Growth")

 ClassDataPlot 

Exploratory plot of all combinations of substrate and treatment run by the class.

Class_nest <- as_tibble(ClassData) %>%
  nest(data = c(Temp_C, Initials_4letter, YYYYMMDDHHMM, Areamm2, ehours, Colonies))

Fit logistic model to pooled class data, nested by Substrate and Treatment

Not working yet; some data ‘nests’ have too little data to fit or too little variation in the data to support a valid fit.

#define starting, lower, and upper bounds for fit parameters to constrain logistic fit
logistic_eqn_start<-list(pmax = max(ClassData$Areamm2, na.rm = TRUE), mu = 0.05, intercept = min(ClassData$Areamm2, na.rm = TRUE))

logistic_eqn_lower<-c((max(ClassData$Areamm2, na.rm = TRUE) * 0.5),0.001,((min(ClassData$Areamm2, na.rm = TRUE) * 0.1)))

logistic_eqn_upper<-c((max(ClassData$Areamm2, na.rm = TRUE) * 2),1,((min(ClassData$Areamm2, na.rm = TRUE) * 4)))

#fit a pooled model using the logistic equation nested by temp_C

Class_log <- Class_nest %>%
  mutate(
  fit = map(data, ~nlsLM(Areamm2 ~ logistic_eqn(x = ehours, pmax, mu, intercept), data = .x, start = logistic_eqn_start, lower = logistic_eqn_lower, upper = logistic_eqn_upper)),
  predict = map(fit,augment),
  tidied = map(fit, tidy),
  param = map(fit, glance)
  )

Class_log %>%
  unnest(predict) %>%
  ggplot() +
  geom_point(aes(x = ehours, y = Areamm2)) +
  geom_line(aes(x = ehours, y = .fitted), size = 0.5) +
geom_ribbon(aes(x = ehours, ymin = (.fitted - .resid), ymax = (.fitted + .resid), alpha = 0.1),show.legend = FALSE) +
  facet_grid(cols = vars(Substrate), rows = vars(Treatment)) +
  theme_bw() +
  labs(y = "Areamm^2",
       title = "Colony Growth, Logistic Fits",
       caption = "Logistic line, residuals grey ribbon, points data")

#parameters of colony specific logistic fits
Class_log %>%
  unnest(tidied) %>%
  select(-c(data, fit, predict, param)) %>%
  mutate_if(is.numeric, round, digits = 2)

Discussion

Emily’s passion for mould led her to an illustrious career as a mycologist or baker. Emily’s mould colony growth was best approximated using an exponential curve fit because her data had not reached a plateau at the time of analyses.